PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine
نویسندگان
چکیده
Accurately identifying bacteriophage virion proteins from uncharacterized sequences is important to understand interactions between the phage and its host bacteria in order to develop new antibacterial drugs. However, identification of such proteins using experimental techniques is expensive and often time consuming; hence, development of an efficient computational algorithm for the prediction of phage virion proteins (PVPs) prior to in vitro experimentation is needed. Here, we describe a support vector machine (SVM)-based PVP predictor, called PVP-SVM, which was trained with 136 optimal features. A feature selection protocol was employed to identify the optimal features from a large set that included amino acid composition, dipeptide composition, atomic composition, physicochemical properties, and chain-transition-distribution. PVP-SVM achieved an accuracy of 0.870 during leave-one-out cross-validation, which was 6% higher than control SVM predictors trained with all features, indicating the efficiency of the feature selection method. Furthermore, PVP-SVM displayed superior performance compared to the currently available method, PVPred, and two other machine-learning methods developed in this study when objectively evaluated with an independent dataset. For the convenience of the scientific community, a user-friendly and publicly accessible web server has been established at www.thegleelab.org/PVP-SVM/PVP-SVM.html.
منابع مشابه
Bubble Pressure Prediction of Reservoir Fluids using Artificial Neural Network and Support Vector Machine
Bubble point pressure is an important parameter in equilibrium calculations of reservoir fluids and having other applications in reservoir engineering. In this work, an artificial neural network (ANN) and a least square support vector machine (LS-SVM) have been used to predict the bubble point pressure of reservoir fluids. Also, the accuracy of the models have been compared to two-equation stat...
متن کاملCarbon Monoxide Prediction in the Atmosphere of Tehran Using Developed Support Vector Machine
Air quality prediction is highly important in view of the health impacts caused by exposure to air pollutants in urban air. This work has presented a model based on support vector machine (SVM) technique to predict daily average carbon monoxide (CO) concentrations in the atmosphere of Tehran. Two types of SVM regression models, i.e. -SVM and -SVM techniques, were used to predict average daily C...
متن کاملPREDICTION OF SLOPE STABILITY STATE FOR CIRCULAR FAILURE: A HYBRID SUPPORT VECTOR MACHINE WITH HARMONY SEARCH ALGORITHM
The slope stability analysis is routinely performed by engineers to estimate the stability of river training works, road embankments, embankment dams, excavations and retaining walls. This paper presents a new approach to build a model for the prediction of slope stability state. The support vector machine (SVM) is a new machine learning method based on statistical learning theory, which can so...
متن کاملCarbon Monoxide Prediction in the Atmosphere of Tehran Using Developed Support Vector Machine
Air quality prediction is highly important in view of the health impacts caused by exposure to air pollutants in urban air. This work has presented a model based on support vector machine (SVM) technique to predict daily average carbon monoxide (CO) concentrations in the atmosphere of Tehran. Two types of SVM regression models, i.e. -SVM and -SVM techniques, were used to predict average daily C...
متن کاملPrediction of true critical temperature and pressure of binary hydrocarbon mixtures: A Comparison between the artificial neural networks and the support vector machine
Two main objectives have been considered in this paper: providing a good model to predict the critical temperature and pressure of binary hydrocarbon mixtures, and comparing the efficiency of the artificial neural network algorithms and the support vector regression as two commonly used soft computing methods. In order to have a fair comparison and to achieve the highest efficiency, a comprehen...
متن کامل